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ABSTRACT 

This investigation examined the relationship between 
problem solving ability and the criteria used to decide whether two 
classical mechanics problems could be solved similarly. The 
investigators began by comparing experts and novices on a similarity 
judgment task and found that experts predominantly relied on the 
problems' deep structure in deciding similarity of solution, although 
the presence of surface feature similarity had a clear adverse effect 
on performance. Novices relied predominantly on surface features, but 
were capable of using the problems 1 deep structure under certain 
conditions. In a second experiment, groups of novices who tended to 
employ different types of reasoning in making similarity judgments 
were compared. Compared to novices who relied predominantly on 
surface features, novices who made greater use of principles tended 
to categorize problems similarly to experts, as well as score higher 
in problem solving. These results suggest that principles play a 
fundamental role in the organization of conceptual and procedural 
knowledge for good problem solvers at all levels. (Author/CW) 
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Abstract 

This Investigation examined the relationship between problem solving 
ability and the criteria used to decide that two classical mechanics problems 
would be solved similarly. We began by comparing experts and novices on a 
similarity Judgment task and found that experts predominantly rely on 
problems' deep structure In deciding similarity of solution, although the 
presence of surface feature similarity has a clear adverse effect , n 
performance. Novices relied predominantly on surface features, but were 
capable of using problems' deep structure under certain conditions. In a 
second experiment, we compared groups of novices at the same level of 
experience who tended to employ different types of reasoning In making 
similarity Judgments. Compared to novices who relied predominantly on surface 
features, novices who made greater use of principles tended to categorize 
problems similarly to experts, as well as score higher In problem solving. 
These results suggest that principles play a fundamental role In the 
organization of conceptual and procedural knowledge for good problem solvers 
at al I levels. 
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What Is the relationship between problem solving ability and the 
criteria one uses to decide whether or not two problems would be solved 
similarly? To date, attempts to answer this question have focused on 
Investigating the end points of the spectrum of problem solving skill, namely 
experts and novices. For experts categorization of a problem as a type 
suggests possible solution strategies and can directly Influence ability to 
generate a successful solution (Hayes 8* Simon, 1976; Hlnsley, Hayes & Simon, 
1978; Newell & Simon, 1972; Simon & Simon, 1978), Research In domains such as 
mathematics (Schoenfeld 8* Herrmann, 1982) and physics (Chi, Feltovlch & 
Glaser, 1981) Indicates that experts tend to focus on the deep structure of a 
problem (e.g., principles and concepts that could be used to solve the 
problems) to decide whether or not two problems would be solved similarly. 
These findings suggest that when attempting to solve a problem, experts first 
consider what prlnclple(s) applies most appropriately to the situation, and 
then decide on a strategy or procedure that will be used to Instantiate the 
principle (Larkln, 1983, 1981 k Larkln, McDermott, Simon & Simon, 1980; Simon & 
Simon, 1978). 

The picture Is different for novices. When asked to categorize problems 
Into types according to similarity of solution, novices tend to cue on surface 
features (e.g., problem Jargon and descriptor terms) as the primary criterion 
of similarity (Chi, et al., 1981; Schoenfeld & Herrmann, 1981). When asked to 
state the general approach they would take to solve a proMem, novices usually 
relate detailed Information (e.g., equations and facts), rather than more 
general principles and concepts (Chi, et al., 1981). However, as problem 
solving skills develop, reliance on deep structure to categorize problems 
Increases (Nlegemann & Paar; 1986). 
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Although we can conclude that both surface features and deep stru c * 
are Important, and perhaps competing, attributes used In Judging word problem 
similarity, It may be Inappropriate to construe the dlchotomous use of 
features, I.e., that experts use deep structure exclusively and that novices 
use surface features exclusively. What Is clear Is that the extent to which 
problem solvers rely on each type of attribute seams to be related to problem 
solving ability. However, little Is known about either how surface features 
and deep structure Interact In generating a problem categorization, or how 
problem categorization Is related to proMem solving ability among novices. 
The two experiments we report here Investigate these Issues. In Experiment 1, 
we designed a similarity Judgment task that allowed us to examine the relative 
contributions of surface features and deep structure In experts' and novices' 
categorization decisions In the domain of classical mechanics. The results of 
Experiment 1 suggested there may be Individual differences In the 
categorization schemes used by nov^es. In Experiment 2, the similarity 
Judgment task was refined In order to Investigate these possible differences 
and to assess the relationship between categorization schemes and problem 
solving abl I I ty. 

EXPERIMENT 1: THE INTERACTION OF SURFACE FEATURES AND 
DEEP STRUCTURE IN PROBLEM CATEGORIZATION 
In order to study the Influence and Interaction of surface features and 
deep structure In categorization, we designed a similarity Judgment task 
similar to those used In studies of object categorization (Rosch and Mervls, 
1975; Mervls, 1980). In our task, a model problem and two comparison problems 
are presented, and the subject must decide which of the comparison problems 
would be solved most like the model problem. The comparison problems differ 
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In which attributes match the model problem, making It possible to Investigate 
systematically the Interaction between surface feature, and deep structure 
attributes In subjects' similarity Judgments. We think this type of task 
represents an advancement over the card sorting task commonly used In problem 
categorization experiments, In which a subject sorts word problems written on 
Index cards Into several piles which are then labeled by the subject to 
Indicate the relationship among all the cards In a particular pile. This task 
requires that the subject develop a categorization scheme dealing with all 
problems simultaneously. In contrast, the similarity Judgment task focuses 
the subjects' attention on specific problems, allowing problem attributes to 
be systematically varied. This simplifies the prediction of outcomes based on 
models of expert and novice performance, as well as the data analysis and 
Interpretation of results. 

In our similarity Judgment task, a given comparison problem could match 
the model problem In surface features (S), deep structure (D), both surface 
features and deep structure (SD), or neither surface features nor deep 
structure (N). These comparison problems were paired In a way such that ono 
and only one problem In the pair matched the model problem In deep structure. 
This led to four types of pairings, which we will refer to as "comparison 
types": 1) b-D, 2) S-SD, 3) N-D, and 4) N-SD. If It Is the case that experts 
and novices rely primarily on different kinds of problem attributes In making 
similarity Judgments, then the patterns of performance expected for experts 
and novices should differ. Assuming that experts base their categorization 
decisions solely on deep structure, then they should choose the comparison 
problem that matches the model problem In deep structure 100% of the time, and 
select D, SD, D, and SD respectively In the four comparison types. Novices 
who base their categorization decisions solely on surface features should 
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choose the comparison problem that matches the model problem In surface 
features whenever It Is possible to do so. Thus, they should choose S In the 
S-D comparison type and SD In the N-SD comparison type. When surface features 
do not allow a distinction to be made, as In the S-SD and the N-D Items, 
either alternative should be equally likely. Hence, novices' choices should 
match the model problem In deep structure 0%, 50%, 50%, and 100% of the time 
for S-D, S-SD, N-D, and N-SD comparison types respectively, and 50% of the 
t Ime overal I . 



Method 

Subjects 

The novice subjects were 45 undergraduate students at the University of 
Massachusetts who had completed the first semester physics course In classical 
mechanics and received a grade of B or better. The expert subjects were 8 
Ph.D. physicists and 2 advanced physics graduate students who were nearlng 
completion of rhe Ph.D. requirements. The novice subjects performed both a 
categorization task and a problem solving task, and were paid for their time. 
The expert subjects volunteered their time and only performed the 
categorization task. 
Categor Izat Ion Task 

Each Item on the categorization task was composed of three elementary 
mechanics problems similar In style and level of difficulty to problems In an 
introductory mechanics tex; (I.e., Resnlck and Halllday, 1977). Each word 
problem was three to five lines long and contained neither pictures nor 
diagrams. For each Item, one problem was designated as the model problem, 
while the other two were designated as comparison problems. Subjects were to 
Indicate which of the two comparison problems "would be solved most similarly" 
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to the model problem. A response was considered correct If the subject chose 
the comparison problem that matched the model problem In deep structure (I.e., 
the physical principle that would be applied to solve both problems was the 
same) . 

There were eight model problems, two dealing with energy principles, two 
dealing with momentum principles, two dealing with angular momentum principles 
and two dealing with Newton's Second Lav; or Kinematics. Each model problem 
appeared four times, once with each of the four comparison types. This 
yielded 32 Items composed of one model problem and two comparison problems. 
The following Is a sample model problem and the four comparison problems that 
were constructed to accompany It: 
Mode I Problem 

A 2.5 kg ball of radius 4 cm Is traveling at 7 m/s on a rough horizontal 
surface, but not spinning. Some distance later, the ball Is rolling 
without slipping at 5 m/s. How much work was done by friction? 
S Comparison Problem 

A 3 kg soccer ba I I of radius 15 cm Is Initially si Idlng at 10 m/s 
without spinning. The ball travels on a rough horizontal surface 
and eventually rolls without slipping. Find the ball's velocity. 
D Comparison Problem 

A small rock of mass 10 g falling vertically hits a very thick 
layer of snow and penetrates 2 meters before coming to rest. If 
the rock's speed was 25 m/s Just prior to hitting the snow, find 
the average force exerted on the rock by the snow. 
SD Comparison Problem 

A 0.5 kg bill lard bal I of radius 2 cm rol Is without si Ipplng down 
an Inclined plane. If the b I I I I a d ball Is Initially at rest, 
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what Is Its speed after It has moved through a vertical dlstancr 
if .5 m? 
N Comparison Problem 

A 2 kg projectile Is fired with an Initial velocity of 1500 m/sec 
at an angle of 30 degrees above the horizontal and height 100 m 
above the ground. Find the time needed for the projectile to 
reach the ground. 

The experiment was run on IBM compatible PC's. The subject was told to 
read the model problem carefully and the two comparison problems that would 
appear below It, and decide which comparison problem would be solved most like 
the model problem. The Items were presented In random order, with no limit 
Imposed on response time. Most subjects completed the task within 45 minutes. 

Problem Solving Task 

In a separate hour-long session, the novice subjects were given a 
problem solving task containing seven classical mechanics problem. Four 
problems required the application of one principle for solution, whereas three 
problems required the application of two principles. Henceforth we will only 
discuss performance on the single-principle problems, since few subjects were 
able to solve the two-pr Inclple problems. The single-principle problems were 
all similar In style and level of difficulty to both the problems appearing In 
the textbook and the problems used In the categorization task. The principles 
Involved In the four problems were: Newton's Second Law. Conservation of 
Energy. Conservation of Linear Momentum, and Conservation of Angular Momentum. 
Each problem was graded on a ten point scale by two physicists; whenever the 
score on a problem differed by two or more points, the solution was discussed 
and a score was determined by consensus. The total scores on the problem 
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solving task ranged from 1 point to 40 points, with a mean of 21,2 points aid 
standard deviation 11.42 points. 

Resul ts 

The performances of the 45 novices and the 10 experts were compared In a 
2 (Groups) by 4 (Comparison Types) by 8 (Model Problems) analysis of variance. 
In genoral, the experts were better able to determine whether two problems 
would be solved through application of the same principle, choosing the 
comparison problem that matched the model problem In deep structure 78% of the 
time. Novices chose the deep structure alternative 59% of the time, which was 
significantly less often than experts, F(1,53) - 28,78, p<,0001. As predicted 
by our assumptions about expert and novice performance, there was a difference 
In how the two groups responded to the four comparison types, as Indicated by 
the Group x Comparison Type Interaction, F(3,159) - 10,00, p<.0001. 
Therefore, wb will discuss the Influence of Comparison Type for experts and 
novices separately. 
Experts 

Comparison Type should have had no Influence on experts' performance had 
they based their decisions about solution similarity strictly on deep 
structure (I.e., the principle Involved). However, there was a significant 
main effect of Comparison Type for experts, F(3,27) « 10.56, p-.OOOl, 
Indicating that the four Comparison Types were not of equal difficulty. The 
mean performances for the Comparison Types (see Table 1) suggest that surface 
features have an adverse Influence on experts' categorization decisions. 
Although the differences among these means were not all significant, they do 
follow the trend predicted for novices, suggesting that experts were adversely 
affected by the same kinds of conditions that negatively Influenced novices. 
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Penance on the „-SD Uems was s.gn.f .cant.y better than U.. on each o, 
th6 other thro, comparison Types (P<.005. Bonforron, famiiywlse error rate: 
„« . OS/e - .008). The mean performance of the experts was s.gnlf.can.ly 

Typ es. except for the N-sD type, where performance was ou,.e h.gh for *>.» 
groups <see Tahle .). rnus. ai.hough experts appear to focus on deep 
structure to a greater degree than novices, surface features Oo interfere „,«, 
their performance. 
No vices 

Consistent .... the assumption that novices cue on surface features when 
m aK,ng similarity Judgments. Comparison Type did infiuence novices" 
performance. F <3,3 2 , P<.000, .means are ,n raPie „. AMPairw.se 

prisons differed significan.iy by P<-000 2 - .05/6 - .«.>. These 

results indicate that surface features piay a major roie in novices" 
categorization schemes, and direc.iy infiuence the process by which they 
d eclde whether or no, two problems wouid he soived simiiar.y. For example, if 
one comparison problem matched the mode, prohiem in hoth surface features and 
d eep structure, then the decision that they wouid he soived s.m.-ar.y was 
facilitated «7« correct for S-SD and N-SD comparison types, versus <6* 
corr ec. for S-0 and N-D comparison types,. However, if a comparison problem 
m afched the modei prohiem oniy in surface features, then the decision that 
tney wouid he soived simiiariy was adverseiy affected <<D* correct for the S-0 
and S-SD pairings, versus 77* correct for the N-D and H-SD pairings,. 

Despite their attraction to surface features as a means o, Judging 
s.miiarity o, so.ut.on. novices as a group do no. seem to reiy soieiy on 
surface features. For S-D items, in wh.ch novices shouid have heen most prone 
t0 .gnoring deep structure, the SS* confidence interval <C„ was a«-0«. 
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well above the predicted 0% correct had they used a strict surface feature 
categorization scheme. Further, In the N-D Items where there was no 
distraction due to surface feature? the proportion of deep structure matchrj 
was significantly above the predicted 50% performance (95% CI - 63%<M<71%). 

For both novices and experts, performance was Influenced by Model 
Problem, F(7,371 )-8.46, p<.0001. Items Involving angular momentum wore the 
most difficult, while those Involving energy tended to be easier. In all but 
one of the 8 Model Problems, experts made more deep structure Judgments than 
did novices, producing an Interaction between Model Problem and Group, 
F(7,371 )-3.05, p-.0039. There was also a significant Interaction of Mod",l 
Problem and Comparison Type, F(21 ,1113)-13.33, p<.0001, suggesting that the 
difficulty of making a decision based on deep structure In the various 
conditions Is related to the context of the problem. 

The extent to which novices appear to rely on deep structure when making 
categorization decisions Is related to their problem solving ability as 
measured by the proMem solving task. The correlation between total 
categorization score and the score on the problem solving skills test was .30, 
F(1 ,43)-4.376, P-.0424. Further, performance on the problem solving task 
supports the notion that novices were better able to select deep structure 
matches on the similarity Judgment task In problem contexts they understood 
better. More specifically, subjects displayed a poor performance In both the 
problem solving task and the similarity Judgment task on problens Involving 
angular momentum, whereas they displayed a relatively good performance on 
problems Involving energy. 
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Discussion 

The findings of Experiment 1 are consistent with the existing literature 
In indicating that the schemes used by subjects to categorize classical 
mechanics problems are related to problem solving expertise In physics. This 
Is most obviously reflected by the greater reliance of experts on deep 
structure similarity In making categorization decisions, and of novices on 
surface feature similarity. Experts were much more likely to Judge that two 
problems would be solved similarly If they were similar In deep structure. In 
contrast, novices of'en Indicated that problems with similar surface features 
would be solved similarly. However, the likelihood that both expert and 
novice subjects will select the deep structure alternative Is Influenced by 
what other problem attributes were present In the comparison problems. Among 
novices, and to some extent among experts, this performance pattern could be 
Interpreted In terms of a threshold-type model (Smith, Shoben and Rips, 1974). 

If the Initial perception of similarity of one of the comparison 
problems to the model problem was high, a threshold model would predict that 
the subject would be Inclined to make a response based on this overall 
Impression of similarity, without conducting any further analysis. Hence, we 
note the relatively high rate of choosing the surface feature alternative In 
the S-D and N-SD comparison types where surface features were pitted against 
alternatives that had no obvious superficial similarity to the model problem. 
If neither comparison problem succeeded In crossing the threshold of 
similarity (as In the N-D Items), subjects were forced to consider more 
carefully what would const I tute similarity, and hence, might be more likely to 
consider principles. 

Is It clearly beneficial for novices to consider principles In 
categorizing problems merely because experts appear to do so? Experiment 1 
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suggests the answer to this quest.cn may be yes. s.nce there was a correlation 
of novices' problem solving and categorization scores. Novices as a group 
varied considerably In the degree to which they judged problems as similar 
that were matched In deep structure. Better novice problem solvers made more 
similarity Judgments based on deep structure than did poorer novice problem 
solvers. Thus. In the domain of physics, the ability to categorize problems 
according to deep structure seems to be beneficial to problem solving. 

in Experiment 2, we will consider the Issue of Individual differences 
among novices more carefully. In doing so. we w... need to clarify what types 
of reasoning lead to particular categorization responses, s.nce the binary 
nature of the responses requ.red In Experiment 1 did not make the subjects- 
reasoning explicit. For example, the assumption that nov.cas who chose the 
deep structure a.ternat.ve did so as a result of actual.y considering the 
problems' deep structure may not be valid. On the other hand, novices may 
have attempted to use deep structure more often than their actual performance 
.nd.cates. but may not have been able to do so correctly. Therefore. In 
Experiment 2. we mod.f.ed the similarity Judgment task In an effort to make 
subjects' reason.ng more explicit, as we., as further exp.ore the re.at.onsh.p 
between problem solving and problem categorization. 

EXPERIMENT 2: CATEGORIZATION CRITERIA AND PROBLEM SOLVING ABILITY OF NOVICES 

Study of the ends of the spectrum of problem solving skill, namely 
experts and novices. Indicate there Is a re.at.onsh.p between categor I zat . on 
cr.ter. a and prob.em so.v.ng ability. The re.evance of this finding for 
understanding the development of expertise would Increase If this re.at.onsh.p 

could a. so be demonstrated among nov.ces at the same .eve. of exper. ence. 

Such a demonstrate would .nd.cate that sk... acquisition Is -nf.uenced from 
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th e beginning by .be types of cues t. wb.cn novices try «o pay attention, and 
,„a« «be foundations for «be acquisition of expertise are laid early In .be 
learning process. Instruction wblcb attempts ,o facMl.a.e use of genera, 
principles may be more effective .ban Instruction wMcb Ignores I.. 

,„ ,ac«, .be correlation between frequency of deep structure decisions 
ana prob,3m solving score ,., rxperlmen. 1. as we,, as re,a.ed researcb (sucb 
as snver. 1979). suggests a more conc,us,ve demonstration Is possible. In 
order to demonstrate «ba, ca.egor ,za„on cr,.er,a and problem solving ab,,,.y 
are re,a.ed for novices a. .be same ,eve, o, experience, we need a .as, wbicb 
aMows us to examine more directly tbe reasons sub]ec,s bave for making 
categorization decisions. Slmply Inferring subjects' reasons for responses, 
as in Experiment .. may be misleading - for example, a m I s I dent I f led 
principle may bave led to an Incorrect response, even .bougb .be novice was 

cuing on deep structure. 

Tberefore, we simplified .be categorization .as, used In Experiment 1. 
sucb tba. tbere was only one comparison problem presented wl.b .be model 
problem in eacb Item. Tbls led to four comparison types: 1) S, 2) 0. 3) SD. 
ana 4) N. Tbe comparison type nomenclature now denotes .be problem attributes 
abated by tbe two problems In eacb Item; for example. In .be SD Items bo.b 
problems sbared .be same surface features and deep s.ruc.ure. Subjects were 
aslt ed ,o decide wbetber or not .be two problems would be solved similarly. 
Tbey were also asked to give a reason for eacb response. Tbe reasons given 
would provide .be basis for separating subjects Into groups according ,0 .be 
criteria tbey used to categorize problems. Tbls would allow ue to determine 
„be,ber different patterns of responses are associated wl.b different types of 
reasoning. Subjects using .be appropriate deep s.ruc.ure reasoning sbould 
respond according to .be following pattern on ,be four comparison types: S-No. 
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D-Yes. SD-Yes, and N-No. Subjects using surface feature reasoning should 
respond similarly on the SD and N items, but In the opposite manner on the S 
and D Items, resulting In 50% correct overall. 



Method 



Subjects 

Forty-four undergraduate students at the University of Massachusetts who 
had completed a first semester physics course participated In this study. 
They performed the categorization task and a problem solving task, which 
included a mathematics proficiency component, and were paid. To have baseline 
data against which to compare the novice data, 7 expert Ph.D. physicists also 
performed the categorization task and were also paid for their participation. 

Categor I zat Ion Task 

The word problems used In the categorization task were similar In type 
and difficulty to those used In Experiment 1. Two word problems were paired 
for each of the 32 Items, one model problem and one comparison problem of type 
S, D, SD or N. There were eight model problems, each of which appeared four 
times, once with each of the four comparison problems. A response was 
considered "correct" If I t was that expected (as defined earlier) when 
appropriate deep structure reasoning was used. 

The task was presented In a booklet, with two problems per page. 
Subjects were Instructed to decide whether or not the two problems would be 
solved similarly, and to respond by stating yes or no. They were then to 
provide a reason for their response. One hour was allowed for- the task, and 
all subjects finished within this time limit. 

Each of the reasons subjects gave was classified according to the 
following (non-mutual ly exclusive) characteristics: surface features, 
equation-based, physics terminology-based, and principles, subjects were 
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classified Into three groups on the basis of the type of reasoning most 
frequently employed: 1) Surface Feature, 2) Principle, or 3) Mixed. 
Classification Into either the Surface Feature group or the Principle group 
meant the subject had considered the same type of Information on 17 or more of 
the 32 Items. The members of the Mixed group employed a variety of reasoning 
strategies, none of which was used a majority of the time; they commonly 
employed equation-based or physics terminology-based reasoning on a large 
proportion of the Items. In the novice group. 17 subjects were In the Surface 
Feature Group. 11 subjects were In the Principle Group, and 16 subjects were 
In the Mixed Group. All 7 expert subjects were primarily principle users. 

Problem Solving Task 

The problem solving task contained four problems, which were the same as 
single-principle problems given In Experiment 1. This task was a portion of a 
longer task which assessed both physics knowledge and mathematics proficiency. 
One hour was allotted for completion of the entire problem solving task. 

Each of the four physics problems was graded on a ten point scale. 
Scores on these four problems ranged from 0 points to 34 points, with a mean 
of 12.4 points and standard deviation of 10.28 points. The math proficiency 
portion covered topics In algebra, graphing, vectors, trigonometry and 
geometry. Scores on the math portion ranged from 14 to 40 out of 40 possible 
points, with a mean of 29.5 points and standard deviation of 6.4 points. 



Resul ts 



The performances of the 44 novice and 7 expert physicists on the 
categorization task were compared In a 2 (Groups) x 4 (Comparison Types) x 8 
(Model Problems) ANOVA. As In Experiment 1, experts made more correct 
decisions on the basis of matching deep structure (95%) than did the novices 
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as a group (62%), F(1 .48W0.27, p<.0001. or any of the three novice subgrouos 
(Surface Feature Group - 56%, Mixed Group - 63%, Principle Group - 69%), 
F(1.22)-113.35, p<.0001, F(1 ,21 )-131 .26, p<.0001). and F(1 ,16)-73.63, p<.0001. 
However, on the average., novices made many more decisions that were correct on 
the basis of deep structure than one might expect; the performance of the 
novice subjects, at 62% correct, was significantly higher than the 50% correct 
predicted for novices If we assume they employ only surface features In 
categorization decisions. 95% CI- 59%<M<63%. 

comparison Type . For novices, but not experts, categorization 
performance was Influenced by Comparison Type, as Indicated by a 3 (Novice 
Groups) x 4 (Comparison Types) x 8 (Model problems) ANOVA. F(3, 1 29)-132.71 , 
p<.0001. and by a 4 (Comparison Types) x 8 (Model Problems) ANOVA for experts. 
F(3.18)-1 .30, P-.3061. Novice performance on each Comparison Type differs 
from that of every other type at a level of P<.001 (EF/k-.05/6o.008) . As can 
be seen In Table 2. subjects experienced the most difficulty In correctly 
rejecting S comparison problems as being appropriate matches to the model 
problem. This result. In combination with the high rate of correct acceptance 
of the SD comparison problems, supports our findings from Experiment 1 by 
Indicating the relevance that novices attach to surface features In making 
decisions about solution similarity. In both experiments, the presence of 
surface feature similarity depressed the rate of deep structure decisions when 
It v 3 uncorrected with deep structure similarity (as In the S-D and S-SD 
Items In Experiment 1. and the S Items In this experiment) and Increased the 
rate when It was correlated (as In the N-SD I terns In Experiment 1. and SD 
Items In this experiment). 

The Importance of surface feature similarity for novices does not mean 
deep structure Is not considered, as we suggested In Experiment 1. In 
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Experiment 2, novices were much better at correctly accepting D comparisons 
than expected (50% actual versus 0% predicted). When there was no competition 
from surface features, subjects were much more capable of making correct 
decisions Involving deep structure. In cases where there was neither surface 
feature nor deep structure similarity (I.e., N comparisons), subjects were 
reasonably good at assessing the lack of any similarity and making a correct 

rejection (90% correct). 

Reasoning Employed . As expected, experts nearly universally (93% of the 
time) provided reasons for their Judgments of similarity that were based on 
physics principles. The principles Involved were Identified correctly 98% of 
the time. Clearly, experts reason primarily on the basis of deep structure, 
as their responses In both Experiments 1 and 2 Indicate, and do so 
appropr lately. 

Novices differ from experts and from each other In the degree to which 
they utilize principles In their reasoning. On the average, members of the 
Principle Group mentioned principles 70% of the time, members of the Mixed 
Group 23% of the time, and members of the Surface Feature Group 6% of the 
time, so there were major differences among the groups In proclivity to employ 
principles In reasoning. These principles were Identified correctly by the 
three groups 60%, 61%, and 62% of the time, respectively. Thus, when novices 
chose to utilize principles In an explanation, there was no difference among 
the three groups In '.he rates of correct Identification, although there were 
marked differences In the frequency of using principles among the three 
groups. 

The tendency to employ principles In reasoning Is related to overall 
success In categorization. The Surface Feature Group, with 56% correct, 
performed significantly lower than both the Mixed Group, with 63% correct, 
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F(1.3D-6.73, P-.0143, and the Principle Group, with 69% correct, 
F(1.26)-15.64, p-.OOOS. The three groups also tended to have difficulty with 
different types of problems, as Indicated by an Interaction of Group and 
Comparison Type, F(6,123)-3.08, P-.0077. 

As can be seen In Table 2, the relative difficulty of the four 
Comparison Types was the same for all three groups. The performances on the SD 
and N Items did not differ significantly among the groups, which Is consistent 
with the 100% correct predicted performance for both surface feature and 
principle users. It was on the S and D Items that the differences among the 
groups appeared. The Surface Feature Group performed lower than the Principle 
Group on both S and D Items, t(26)-3.l0, p-0046 and t(26)-3.23, p.. 0033. We 
had predicted surface feature users would make correct responses 0% of the 
time on these two Comparison Types, while principle users would be rorrect 
100% of the time. Thus, the performances of these groups were In the 

predicted direct Ions. 

Clearly, although the performance of the Principle Group was much better 
than that of the average novice, their performance was far from that predicted 
for one who relies on principles alone. Two factors contribute to this 
outcome: 1) principles were not used In every problem analysis, and 2) the 
principles Identified were often Inappropriate. Of the 70% of the time that 
members of the Principle Group used principles, 38% of the time they 
Identified principles Incorrectly. Hence, what may be mere Important than the 
appropriateness of a principle In the development of expertise Is the 
frequency with which one attempts to apply principles to a problem analysis. 

The data argue that attempted principle use Is related to problem 
solving ability. Mean performances on the problem solving task were 14%, 32%. 
and 57% correct for the Surface Feature, Mixed, and Principle Groups 
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respectively. The three groups were significantly different on this, measu a. 
F(2,41)-16.19, p<.000l, and each group differed from the other at a lov I of 
p<.008. The correlation between the frequency of attempts of an individual to 
reason by principle and the problem solving test score was .63, 
F(1 ,42)-27.657, p<.0001 . One might argue that a third factor, such as 
Intelligence, is responsible for this relationship. However, even when level 
of mathematics proficiency (which we take as an Index of Intelligence) was 
held constant, the correlation was still significant, r-.505, Z-3.516, 
p<.0004. Novices who attempt to categorize problems using principles tend to 
be better problem solvers. 

D I scuss Ion 

Experiment 2 demonstrates that the relationship between use of 
principles In categorization and problem solving skills Is not only an 
appropriate characteristic for making distinctions between exports and 
novices, but Is also appropriate for distinguishing among "good" and "poor" 
novice physics students with similar educational experiences. Novices who 
attempt to analyze mechanics problems using principles make more correct 
judgments concerning solution similarity and are better problem solvers. Note 
that these novices were often Incorrect In Identifying the principle needed to 
solve a problem, but that the principle-based approach to problem 
categorization generally appears to have a value beyond the successfulness of 
the attempt to classify a problem. 

Why is principle use so highly correlated with problem solving ability? 
We believe that storing Information about types of physics problems In terms 
of general principles, as opposed to equations and surface features which 
novices generally employ (Chi, et al., 1981; Mestre & Gerace, 1986), Is a much 
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more efficient form of representing physics knowledge. The effort required to 
organize physics knowledge In terms of broad categories Is probably Initially 
much greater than that required to organize the knowledge In terms of 
equations. However, the effort Involved In maintaining categor leal 
Information Is much less than that needed to maintain and search through a 
large equation data base. Therefore, our findings Indicate that three months 
after finishing their mechanics course, principle users could solve problems 
more effectively than non-pr Inc Ipie users. Although this study cannot address 
the causal relation between principle use and problem solving skill, It 
suggests that pedagogy in physics might be more effective If attempts were 
made to convey Information In a manner conducive to organization by 
principles, a view supported by other research as well (Eylon & Relf, 1984; 
Hel ler & Relf, 1984). 

General Discussion 

In these two studies, we attempted to characterize how and when novice 
physics students and expert physicists use surface features and deep structure 
to determine that two problems would be solved similarly, and how 
categorization and problem solving skills are related. In agreement with 
other studies (Chi, et al., 1981; Schoenfeld & Herrmann, 1982), we found the 
presence of surface features to adversely Influence the categorization 
decisions of novices, and of experts to some extent as well. Despite the 
apparent difficulty In ignoring the semblance of similarity conveyed by 
surface features, the conclusion that novices focus almost exclusively on 
surface feature similarity Is unwarranted. 

Two pieces of Information argue against such a conclusion. First, In 
conditions where surface feature similarity vjas not available to "assist" In a 
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decision, the performance of novices was much better than that expected If 
they had been relying solely on surface feature similarity. Second, when 
novices were asked to state why they believed two problems would be solved 
similarly, many of them responded with arguments based on principles, although 
equation-based reasoning was also fairly common. 

Novices are not a uniform crowd. Some do rely primarily on surface 
feature similarity to categorize problems, while others attempt to reason 
fairly consistently by principles. It Is not at all clear that a picture of 
the novice progressing from reliance on surface features to reliance on deep 
structure Is accurate. Novices who are better problem solvers, and presumably 
the ones more likely to continue In the field, tend to apply principles more 
often when deciding whether or not two problems would be solved similarly. 
Surface features Interfere with the decision, but are not the primary focus of 
attention for the good problem solvers. As the novice becomes more able to 
distinguish the critical attributes of problems, surface feature similarity 
has less Influence on problem analysis. 

These results suggest that our goal as educators should be to structure 
the Information presented In the classroom In a way that assists the learner 
In organizing knowledge by principles. There Is evidence that standard 
pedagogical practices do not Incorporate this strategy (Collins, Brown & 
Newman, In press). We may not be able to ensure that every student views 
principles as fundamental, but the central Ity of principles to both experts 
and the better novices suggests that this path Is more likely to lead to 
eventua I understand I ng . 
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Table 1: Predicted and Observed Performance of Experts and Novices 



Compar 1 son Type 


Experts 
Predicted Observed 


Novices 
Predicted Observed 


S-D 


100% 


66% 


0% 


26% 


S-SD 


100% 


71% 


50% 


54% 


N-D 


100% 


84% 


50% 


67% 


N-SD 


100% 


91% 


100% 


87% 


Total 


100% 


78% 


50% 


59% 



Table 2: Performance of 3 groups on 4 comparison types 



Surface Features Mixed Principle 



S 18% 23% 43% 

D 39% 53% 64% 

SD 76% 87% 78% . 

N 90% 90% 91% 



56% 63% 
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